Conversational speech recognition using acoustic and articulatory input
نویسندگان
چکیده
The combination of multiple speech recognizers based on different signal representations is increasingly attracting interest in the speech community. In previous work we presented a hybrid speech recognition system based on the combination of acoustic and articulatory information which achieved significant word error rate reductions under highly noisy conditions on a small-vocabulary numbers recognition task. In this study we extend this approach to large-vocabulary conversational speech recognition using the Gaussian mixture acoustic modeling paradigm. We demonstrate that the articulatory input representation we propose contains information which is complementary to that provided by standard MFCC features, and that their combination can significantly reduce the word error rate on conversational speech. Various combination strategies (feature-level, state-level and word-level combination) are compared and evaluated.
منابع مشابه
Combining acoustic and articulatory feature information for robust speech recognition
The idea of using articulatory representations for automatic speech recognition (ASR) continues to attract much attention in the speech community. Representations which are grouped under the label ‘‘articulatory’’ include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements or classification scores for pseudo-articul...
متن کاملArticulatory information and Multiview Features for Large Vocabulary Continuous Speech Recognition
This paper explores the use of multi-view features and their discriminative transforms in a convolutional deep neural network (CNN) architecture for a continuous large vocabulary speech recognition task. Mel-filterbank energies and perceptually motivated forced damped oscillator coefficient (DOC) features are used after feature-space maximum-likelihood linear regression (fMLLR) transforms, whic...
متن کاملTonal articulatory feature for Mandarin and its application to conversational LVCSR
This paper presents our recent work on the development of a tonal Articulatory Feature (AF) for Mandarin and its application to conversational LVCSR. Motivated by the theory of Mandarin phonology, eight features for classifying the acoustic units and one feature for classifying the tone are investigated and constructed in the paper, and the AF-based tandem approach is used to improve speech rec...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملUsing multiple acoustic feature sets for speech recognition
In this paper, the use of multiple acoustic feature sets for speech recognition is investigated. The combination of both auditory as well as articulatory motivated features is considered. In addition to a voicing feature, we introduce a recently developed articulatory motivated feature, the spectrum derivative feature. Features are combined both directly using linear discriminant analysis (LDA)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000